Sampling Variability – The Heart of Inference

Salaries of football coaches

In your team…

First

  • sample 10 salaries
  • calculate the median

Then

  • sample 25 salaries
  • calculate the median

You have a sample of 25 UC & CSU coach salaries.


Would you feel comfortable inferring that the median salary of your sample is close to the median salary of all UC & CSU coaches?


Why or why not?

Why sample more than once?

Variability is a central focus of the discipline of Statistics!

Making decisions based on limited information is uncomfortable!

You likely weren’t willing to infer the population median salary from your sample!

Sampling Framework

population – collection of observations / individuals we are interested in

population parameter – numerical summary about the population that is unknown but you wish you knew


sample – a collection of observations from the population

sample statistic – a summary statistic computed from a sample that estimates the unknown population parameter.

Statistical Inference

There were 252 “Head Coaches” at University of California and California State Universities in 2019


Median salary for all coaches
$137,619

Inferring information from your sample onto the population is called statistical inference.

Statistical Inference Reasoning

  • If the sampling is done at random
  • the sample is representative of the population
  • any result based on the sample can generalize to the population
  • the point estimate is a “good guess” of the unknown population parameter



Shouldn’t one random sample be enough then? Isn’t that what we use to make confidence intervals and do hypothesis tests?

Virtual Sampling

rep_sample_n(coaches, 
             size = 25, 
             reps = 1, 
             replace = TRUE)


Employee Name Job Title Total Pay & Benefits
Lyndsie Olson Intercol Ath Head Coach Ex 18075.0
Nancy Mcdaniel Head Coach 5 181157.0
Sam Crosson Head Coach 5 225184.0
Lauren Beth Nadler Head Coach 5 44714.0
Sandra S Kriezel HEAD COACH - 12 MONTH 120699.5
Charles Kelly Intercol Ath Head Coach Ex 3571805.0

\(\vdots\)

Distribution of 1000 medians from samples of 25 coaches

Sampling Distributions

  • Visualize the effect of sampling variation on the distribution of any point estimate
    • In this case, the sample median
  • We can use sampling distributions to make statements about what values we can typically expect.

Be careful! A sampling distribution is different from a sample’s distribution!

Distributions of 1000 medians from different sample sizes

What differences do you see?

Variability for Different Sample Sizes

Sample Size Standard Error of Median
25 19273.031
50 13018.543
100 7849.817
  • Standard errors quantify the variability of point estimates

  • As a general rule, as sample size increases, the standard error decreases.

Careful! There are important differences between standard errors and standard deviations.

A good guess?

Precision & Accuracy

  • Random sampling ensures our point estimates are accurate.


  • Larger sample sizes ensure our point estimates are precise.

Sampling Activity!